14 research outputs found
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
A Hybrid Framework for Matching Printing Design Files to Product Photos
We propose a real-time image matching framework, which is hybrid in the sense
that it uses both hand-crafted features and deep features obtained from a
well-tuned deep convolutional network. The matching problem, which we
concentrate on, is specific to a certain application, that is, printing design
to product photo matching. Printing designs are any kind of template image
files, created using a design tool, thus are perfect image signals. However,
photographs of a printed product suffer many unwanted effects, such as
uncontrolled shooting angle, uncontrolled illumination, occlusions, printing
deficiencies in color, camera noise, optic blur, et cetera. For this purpose,
we create an image set that includes printing design and corresponding product
photo pairs with collaboration of an actual printing facility. Using this image
set, we benchmark various hand-crafted and deep features for matching
performance and propose a framework in which deep learning is utilized with
highest contribution, but without disabling real-time operation using an
ordinary desktop computer
Filter design for small target detection on infrared imagery using normalized-cross-correlation layer
In this paper, we introduce a machine learning approach to the problem of
infrared small target detection filter design. For this purpose, similarly to a
convolutional layer of a neural network, the normalized-cross-correlational
(NCC) layer, which we utilize for designing a target detection/recognition
filter bank, is proposed. By employing the NCC layer in a neural network
structure, we introduce a framework, in which supervised training is used to
calculate the optimal filter shape and the optimum number of filters required
for a specific target detection/recognition task on infrared images. We also
propose the mean-absolute-deviation NCC (MAD-NCC) layer, an efficient
implementation of the proposed NCC layer, designed especially for FPGA systems,
in which square root operations are avoided for real-time computation. As a
case study we work on dim-target detection on mid-wave infrared imagery and
obtain the filters that can discriminate a dim target from various types of
background clutter, specific to our operational concept
Defining Image Memorability using the Visual Memory Schema
Memorability of an image is a characteristic determined by the human observers’ ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers
SCALE-SPACE APPROACH FOR THE COMPARISON OF HK AND SC CURVATURE DESCRIPTIONS AS APPLIED TO OBJECT RECOGNITION
Using mean curvature (H) and Gaussian curvature (K) values or shape index (S) and curvedness (C) values, HK and SC curvature spaces are constructed in order to classify surface patches into types such as pits, peaks, saddles etc. Since both HK and SC curvature spaces classify surface patches in to similar types, their classification capabilities are comparable. Previously, HK and SC curvature spaces were compared in terms of their classification ability only at the given data resolution [2]. When calculating H. K, C and S values, the scale/resolution ratio is highly effective. However, due to its scale invariant nature, shape index (S) values are independent of the resolution or the scale. Thus it is no wonder that SC method gives better results than HK method when the comparison is carried out at an uncontrolled scale/resolution level. In this study, the scale/resolution ratio is set to a constant value for the whole database and scale spaces based on both HK and SC methods are built. Scale and orientation invariant features are extracted using scale spaces and these features are used in object recognition tasks. The methods are compared both mathematically and experimentally in terms of their surface classification and object recognition performances
Simulation of Turkish lip motion and facial expressions in a 3D environment and synchronization with a Turkish speech engine
In this thesis, 3D animation of human facial expressions and lip motion and their synchronization with a Turkish Speech engine using JAVA programming language, JAVA3D API and Java Speech API, is analyzed. A three-dimensional animation model for simulating Turkish lip motion and facial expressions is developed. In addition to lip motion, synchronization with a Turkish speech engine is achieved. The output of the study is facial expressions and Turkish lip motion synchronized with Turkish speech, where the input is Turkish text in Java Speech Markup Language (JSML) format, also indicating expressions. The animation is created using JAVA3D API. 3D facial models corresponding to different lip positions of the same person are morphed to each other to construct the animation. Moreover, simulations of human facial expressions of emotions are created within the animation. Expression weight parameter, which states the weight of the given expression, is introduced. The synchronization of lip motion with Turkish speech is achieved via CloudGarden(R)'s Java Speech API interface [2]. "Levent16k SAPI 4-5 Male Voice" of G-V.S Voice Technologies Software Firm was used for Turkish speech engine [3]. As a final point a virtual Turkish speaker with facial expression of emotions is created for JAVA3D animation
3D Data Processing for Enhancement of Face Scanner Data
The data acquired by 3D face scanners have distortions such as spikes, holes and noise. Enhancement of 3D face data by removing these distortions while keeping the face features is important for the applications using these data. In this study, thresholding is used for removing spikes, thresholding together with face symmetry, is used for hole filling and bilateral filtering is used for smoothing and satisfactory results are obtained on FRGC 3D face data
3D face modeling using multiple images
3D face modeling based on real images is one of the important subject of Computer Vision that is studied recently. In this paper the study that eve contucted in our Computer Vision and Intelligent Systems Research Laboratory on 3D face model generation using uncalibrated multiple still images is explained
Scale invariant representation of 2 5D data
In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pitness) measure is defined based on Gaussian curvature calculation, which is performed at various scales on the surface. Finally a "position vs. scale" feature volume is obtained and the graph nodes are extracted from this feature space by volume segmentation techniques